# Document Image Parsing
VL3 SigLIP NaViT
Apache-2.0
The visual encoder for VideoLLaMA3, utilizing Arbitrary Resolution Visual Tokenization (AVT) technology to dynamically process images and videos of different resolutions.
Text-to-Image
Transformers English

V
DAMO-NLP-SG
25.55k
8
Model3
MIT
Document image understanding model fine-tuned based on naver-clova-ix/donut-base-finetuned-cord-v2
Image-to-Text
Transformers

M
sunilsai
13
0
Donut Base Medical Handwritten Blocks Data Extraction
MIT
A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents
Text Recognition
Transformers

D
mjawadazad2321
15
1
Featured Recommended AI Models